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Abstract — Using the concept of discrete noiseless channels, 
it was shown by Shannon in A Mathematical Theory of Com- 
munication that the ultimate performance of an encoder for a 
constrained system is limited by the combinatorial capacity of 
the system if the constraints define a regular language. In the 
present work, it is shown that this is not an inherent property of 
regularity but holds in general. To show this, constrained systems 
are described by generating functions and random walks on trees. 

I. INTRODUCTION 

A constrained system allows the transmission of input 
sequences of weighted symbols that fulfill certain constraints 
on the symbol constellations. Constrained systems have been 
of recent interest, e.g., in the context of storage systems 1 1]. A 
natural question is how to efficiently encode a random source 
such that it becomes a valid input for a constrained system 
||2J. Furthermore, it is of interest to determine the ultimate 
r performance of such an encoder. This leads to the notion of 
the capacity of constrained systems. 

Previous work: Shannon [31 investigated the capacity of 
constrained systems within the framework of the discrete 
noiseless channel (DNC). For the case where the constraints 
form a regular language [4|, it was stated in |3, Theorem 8] 
that the maximum entropy rate R of a valid input process is 
equal to the combinatorial capacity C, which is defined as 

C = limsup ^ (1) 

u — >oo ^ 

where v denotes the length of the sequences and N{v) denotes 
the number of distinct sequences of length v that are accepted 
by the considered DNC. Here and hereafter. In denotes the 
natural logarithm. A detailed proof of the equality between 
R and C was recently given in [5|. This proof is heavily 
based on the regularity of the constraints. However, it is 
not clear whether this equality is an inherent property of 
regular languages or whether it holds in general. It should be 
noted that sequences with non-regular constraints have been 
of research interest recently, e.g., in |6|. An early treatment of 
DNCs can be found in |7|. 



Contributions: In this paper, we use the framework of 
general DNCs as introduced in [8| to show the following. If 
the set of valid input sequences for a constrained system can 
be generated by a Markov process, then the maximum entropy 
rate of such a process is given by the combinatorial capacity of 
the system, irrespective of whether the constraints are regular 
or not. Our result can be seen as a generalization of Shannon's 
result im Theorem 8] to general DNCs and in particular non- 
regular DNCs. Furthermore, since our derivations also apply 
for the regular case, they also serve as a new way to derive 
im Theorem 8]. 

The remainder of the paper is organized as follows. In 
Section HI] we present the framework of general DNCs and the 
calculation of combinatorial capacities by generating functions 
as introduced in fF|. We then define in Section Hill Markovian 
input processes and entropy rates for general DNCs. In Sec- 
tion HV] we define the maximum entropy rate R of general 
DNCs and for sake of illustration, we show for two simple 
examples that R is equal to the combinatorial capacity C. 
Finally, in Section |V] we prove that R = C holds for general 
DNCs. 

II. DISCRETE NOISELESS CHANNELS 

To calculate the combinatorial capacity of general DNCs, 
we interpret generating functions as functions on the com- 
plex plane and investigate their convergence behavior This 
approach, mostly referred to as analytic combinatorics, is 
discussed in detail in |9|. We consider a more general case 
since we allow non-integer valued symbol weights. In order 
to handle this situation, we use general Dirichlet's series ifTol 
instead of Taylor series as generating functions. 

A. Definitions and Notation 

Our definition of DNCs as presented next mainly follows 
the one given in |8|. 

Definition 1. A DNC A — {A,lo) consists of a countable 
set A of strings accepted by the channel and an associated 
weight function : A — » K® (R® denotes the nonnegative real 
numbers) with the following property. If a, 6 £ A and ah e 



A {ah denotes the concatenation of a and b), then uj{ab) = 
ix}{a) + By convention, the empty string e is always 

an element of A and the weight of e is equal to zero, i.e., 

w{e) = 0. 

Definition 2. Let A = {A,uj) represent a DNC. We define 
the generating function of A by 



Ga{s) 



aeA 



s e C 



(2) 



where C denotes the set of complex numbers. 



Let Vl denote the set of distinct string weights of elements 
in A. We order and index the set Q. such that Q, = {i^kJkLi 
with 1^1 < 1^2 < • • • . For every i^k £ ^, N{vk) denotes the 
number of distinct strings of weight Vk that are accepted by 
the channel. We can now write the generating function as 



k=l 



N{vk)e- 



(3) 



Since the coefficients N{vk) result from an enumeration, they 
are all nonnegative. The combinatorial capacity of a DNC as 
defined in ([T]| can now be written as 



C ~ lim sup ■ 

k—^oo 



(4) 



B. DNCs of Interest 

Throughout this paper, we restrict our attention to DNCs 
where the ordered set of string weights {vk}kLi «of too 
dense, that is, there exists some constant L > and some 
constant A' > such that for any integer n > 



max k < Ln^ . 



(5) 



Otherwise, the number of possible string weights in the 
interval [n, n + 1] increases exponentially with n, in which 
case the definition of combinatorial capacity given in ^ is 
not appropriate. This is illustrated in the following example. 

Example 1. Let N{vk) denote the coefficients of the generat- 
ing function of some DNC. Assume N{vk) = 1 for all fc € N 
and assume further 



max k = [i?"] 



i'fc<n 



(6) 



for some R > 1. According to (|4|i, the capacity of the DNC 
is then equal to zero because of In N{iyk) = for all fc G N. 
However, the channel accepts i?" distinct strings of weight 
smaller than n. The average amount of data per string weight 
that we can transmit over the channel is thus lower-bounded 
by lni?"/n — InR, which is according to the assumption 
greater than zero. <] 

For a DNC A = {A, oj) where A is generated over a finite 
set of symbols, the restriction (|5]l is automatically fulfilled liSj 
Appendix A], implying that virtually any constrained system 
of practical interest fulfills (|5]l. Not too dense sequences have 
another interesting property, which we will need in our later 
derivations. We state it in the following lemma. 




Fig. 1. Two different representations of the DNC A = (A.w) by a tree. 
The DNC A is given by A = {e, t, u, tu} with w{t) = ui(u) = 1. 



Lemma 1. If a series {ak}kLi '^ot too dense and if < 
X < 1, then the series X^fe^i converges. 

See 15] Appendix A] for a proof of this lemma. 

C. Calculating the Capacity 

For a DNC of interest, we want to calculate the combina- 
torial capacity as given in (|4|. An explicit formula for regular 
DNCs was provided in |3 Theorem 1]. A detailed derivation of 
this formula for DNCs with regular constraints and non-integer 
valued symbol weights can be found in 15] • In 0, it was 
shown that the combinatorial capacity (|4| is determined by the 
region of convergence (ro.c.) of the corresponding generating 
function for any DNC with the set of possible string weights 
{^felfe^i being not too dense. We restate this theorem here. 

Tlieorem 1. Let A = {A, ui) be a DNC with the generating 
function G^(s). The combinatorial capacity C of A is given 
by C — Q where 5R{s} > Q (^{s} denotes the real part of 
s) is the ro.c. o/G^(s), that is. 



C = limsup = Q. 

k^oc Vk 



(7) 



Theorem [T] applies for general DNCs with possibly non- 
integer valued symbol weights and arbitrary constraints on the 
symbol constellations. It can be interpreted as the general form 
of the Exponential Growth Formula. In |i9i Theorem IV.7], 
the Exponential Growth Formula was stated for DNCs with 
integer valued weights and arbitrary constraints. The latter 
version of the Exponential Growth Formula was used in |6| to 
calculate the combinatorial capacity (|4]i of a non-regular DNC 
with integer valued symbol weights. 

III. INPUT SOURCES FOR DNCs 

The purpose of this section is to define Markovian input 
processes and the corresponding entropy rates for general 
DNCs. First, we represent the set of strings that are accepted 
by a general DNC by a tree and second, we define a Markovian 
input process as a walk on this tree and give a formula for 
its entropy rate. We postpone the problem of finding the 
maximum entropy rate to the next section. 

A. Representing DNCs by Trees 

We represent a DNC A ~ {A, uj) by a tree TU consisting 
of a root, labelled and weighted branches, and paths resulting 
from the concatenation of branches. We restrict our consid- 
erations to paths that start at the root. For each such path. 



we display its label at the corresponding end node. We do 
not allow distinct paths to have the same label. A DNC A 
is represented by a tree if there is a one-to-one mapping 
from A to the path labels. Note that only the set of paths in 
T4 is uniquely determined by this mapping, but not how these 
paths are formed by branches. See Figure [T] for an example 
of this ambiguity. In this figure, a branch is represented by 
an arrow, its weight by the distance between start and end 
node, and its label is written above the arrow. Notice that the 
set of paths represented by the node labels displayed in the 
rectangles is the same for the tree in Figure [T]i and the tree in 
Figure [T]ii. The DNC has a finite set A of accepted sequences, 
therefore, the tree representations are finite. However, DNCs of 
non-zero combinatorial capacity have infinite sets of accepted 
strings and as a consequence also infinite tree representations. 
Surprisingly, we will see in the following that although the 
tree representation of a DNC is not unique, as long as it 
allows the definition of a Markov input source, the maximum 
entropy rate of this source will not depend on the chosen tree 
representation. 

B. Markovian Input Sources 

For a DNC A = {A, ui), we assume that every branch in the 
tree representation T4 has subsequent branches. We can then 
define an input source by a Markov process X = {Xi}'^^, 
where Xi chooses randomly among the branches that start at 
the end node of the realization of Xi^i. Every realization of 
X^^^ = {Xi, . . . ,Xi) is thus a path in T4 starting at the root 
and consisting of I branches. The support of X''' is given by 
the set of all such paths x^'^ and we denote it by X''-'. Note 
that for A — {A, uj), we have 



A 



(8) 



1=1 



Whenever it follows directly from the context, we omit for 
simplicity the superscript / and write x instead of x^'^. For all 
X e X^^\ we have for the probability mass function (PMF) 
Pxw of X(') 

I 

Px(o(x) =P[Xi =xi][|P[X, =a:,|X,_i =x,_i]. (9) 

i=2 

We conclude that the existence of a tree representation T_a 
where each branch has subsequent branches is equivalent to 
the existence of a Markovian input source for A. Note that 
Regular DNCs can be represented by finite state machines 
(FSMs) |4| and the tree representation can be obtained from 
the corresponding FSM. The resulting tree representation then 
has automatically the property that each branch has subsequent 
branches. 

Following Il3|,|l5|, the entropy rate H of X is given by 

H(X(')) 



R{X) 



lim sup ■ 

i- 



(10) 



where Li is equal to the average weig ht of all x e X(') with 
respect to (w.r.t.) the PMF of X^'^ and where H(X(')) denotes 
the entropy of X^'^^ in nats. 



IV. PROBLEM STATEMENT 

We now come to the key topic of this paper: the maximiza- 
tion of the entropy rate of input processes for general DNCs. 

A. Maximum Entropy Rate 

Definition 3. We define the maximum entropy rate R of a 
DNC by 



R = maxH(X). 

X 



(11) 



where the maximum is taken over all Markovian processes X 
that generate valid input sequences for the DNC. 

Note that in ||5l, the term probabilistic capacity was used 
instead of maximum entropy rate. However, we prefer the 
latter term. 

The entropy rate H(X) is maximized, if each term of the 
sequence on the right hand side of ( fTOb is maximized. For 
each /, the maximum entropy per average branch weight 

H(X(')) 



Ri = max ■ 

Pxm Li 



(12) 



is given by the greatest positive real solution of the equation 

1. (13) 



In addition, for all x e X^'-\ the PMF of X^') that achieves 
this rate is uniquely given by 

gxu)(x) =e-"(")^'. (14) 

These two properties of Ri were derived by using Lagrange 
Multipliers in [ 1 1 1 and they were independently derived in [12| 
by using the bound In x < a: — 1 . We offer an alternative proof 
by applying the information inequality |13|, which states for 
the KuUback Leibler Distance -D( || ) of two PMFs p and q 
that 



D{p\\q) > 

with equality if and only if p ^ q. We thus have 

> -D{px(i)\\qxw) 

IxC) (x) 



= Px(')(x)ln 
= H{X^'y)-RiLi 



Pxc) W 



which implies 



< Ri 



(15) 

(16) 
(17) 

(18) 
(19) 



with equality if and only if PxC) = Qxo- Combining ( fTOb . 
( fTTT i. and ( fT2b . we have 

H(X(')) 

R = lim sup Ri — lim sup max . (20) 

1^00 1^00 PxW ^1 

The form on the right hand side of ( |20] | allows us to compare 
the maximum entropy rate of a DNC to its combinatorial 



capacity as given in (01). We illustrate this in the following 
by two simple examples. 

Example 2. Let A — {A, ui) represent a DNC that accepts all 
binary input sequences. The set A is thus given hy A = {0, 1}* 
where * denotes the regular operation star We assume the 
symbol weights cl'(O) = uj{1) = 1. The combinatorial capacity 
is given by 



C = lim sup 

k — 'oo 

= lim sup ■ 

k- 



InTVK) 

I'k 
ln2'= 



(21) 
(22) 



To calculate the maximum entropy rate of A, we note that 
for each x G X^'-*, we have w(x) = I and in addition, the 
cardinality of X*^'^ is given by |X(')| = 2'. The average weight 
Li of x*^'^ is thus given hy Li = I and maximizing the entropy 
rate reduces to maximizing the entropy of X^'^ The maximum 



entropy of X^'^ is given by maxp 
ifTSl . All together we have 



H(X«) ==ln|X(')|, see 



R = lim sup max 



lim sup 

/ — 'OO 



maxp^,,, H(X(')) 



ln|X(') 
= lim sup 

In 2' 

= lim sup — - — . 

I — -too ^ 



(23) 
(24) 
(25) 
(26) 



We see from ( |22] | and ( |26] ) that the maximum entropy rate of 
A is equal to the combinatorial capacity, that is, R = C. < 

Example 3. As in Example|2] we consider a DNC A = {A, oj) 
that accepts all binary input sequences. However, we assume 
the symbol weights uj{Q) — 1 and U!{1) = 2. To show that 
C = R also holds in this case, we have to explicitly calculate 
C and R. To show equality by comparison as we did by ( |22] | 
and ( l26b in the previous example is no longer possible. To 
calculate the combinatorial capacity, we write the generating 
function of A as 



-Is 



-2s\m 



(27) 



m—0 



The series converges if Kje-i" + 6-2''} < 1, therefore, 
the combinatorial capacity C is by Theorem [T| given by the 
smallest positive real solution of 



-Is 



-2s 



1. 



(28) 



Let Y denote a random variable with support {0, 1}, and the 
associated weights uj{Q) = 1 and uj{l) ~ 2. In addition, let L 
denote the average weight of Y. The maximum entropy rate 



of A can then be calculated as 

H(X(')) 

R = lim sup max (29) 

= lim sup max — -^-^ (30) 
= max — ^ — ^. (31) 

PY L 

By ( fT3T l, it follows from the last line that R is also given by 
thus R = C. < 

V. MAIN RESULT 
Based on the concepts introduced in the previous sections, 
we can now state our main result. 

Theorem 2. If the set of valid input sequences of a DNC A = 
{A, bj) can be generated by a Markov process (or equivalently, 
if the DNC can be represented by a tree where each branch 
has a subsequent branch), then the maximum entropy rate R 
of A is equal to its combinatorial capacity C, that is. 



lim sup ■ 

k — *oo 



Vk 



lim sup max 



(32) 



We will prove this equality in the following. Although 
equality was shown in ||5l for regular DNCs, to the best of our 
knowledge nobody has addressed the non-regular case until 
now. 

Proof of Theorem^ To proof the theorem, we show that 
the region of convergence of the generating function G^(s) 
is given by 5R{s} > R. The theorem then follows because of 
Theorem [T| 

The maximum entropy rate R is given by ( l20b . which is 
equivalent to the following. For every e > 0, it holds that 

Ri <R + t almost everywhere (a.e.) (33) 
and 

Ri>R - e infinitely often (i.e.) (34) 

with respect to / G N (the set of natural numbers). Since Ri 
is given by ( fT3T l. this implies further 

-w(x)_R, 



^ g-'^(x)[R4 
xe.£(') 



1< E e 
and 



1 a.e. 



(35) 



^ g--(x)[R-.] ^ ^-u(^)R, ^ ^ -^^^ (36) 

xe3e<') xG3e<" 
Because of dHJ, we can write the generating function as 



= lim y y e-"(")" (38) 
1=1 xGjeo 

and we can use ( |35] l and ( [36] l to give bounds on G^(s) around 
s = R. It follows directly from (|36] | that 



E E 

1=1 xGX(') 



(39) 



For every e > 0, the generating function G^(s) thus diverges 
for K{s} < R — e. It remains to show that it converges 
whenever K{s} > R. For some arbitrary but fixed eo > 0, 
define 



-w(x)[R+£o] 



(40) 

{leN\R+eo<Ri} xexw 

Because of ( |33]) . the sum is taken over a finite number of 
terms, and as a result, D is a finite number. For every e with 
eq > e > 0, we have 



E Ee- 

1=1 xGXO 



-w(x)[R+2e] 
n 

^E E 

1=1 xGxc: 



E E 

1=1 xGxm 



g-w(x)eg-w(x)[R+e] ^^^^ 



g-i/,eg-c^(x)[R+e] 



E^^-'" E 



-w(x)[R+e] 



1=1 



xe3C(') 



E' 

1=1 



D. 



(42) 
(43) 
(44) 
(45) 



The inequality in ( |42] | holds because for every / G N, the 
weight of X G X''' is lower bounded by ti^(x) > z^;. We have 
inequality in ( |44] |. because of exp(— i/(e) < 1 and (l33T l. For 
those / for which Ri < R + e does not apply, we add the 
correcting value D as defined in ( |40] l. We can now write the 
sum in ( |45] l as 

n n 

J2e""' =J2i^'"r- (46) 

1=1 1=1 

For n tending to infinity, according to Lemma [T] this series 
converges, since {i'k}i^ is not too dense and since cxp(— e) < 
1. We conclude that G^(s) converges for 5ft {s} > R + 2e. 

If, for every e > 0, G^(s) diverges for K{s} < R — e and 
converges for K {s} > R + e, then the region of convergence 
of Gj[{s) is given by 5ft {s} > R. This concludes the proof of 
the theorem. ■ 



VI. CONCLUSIONS 

In this work, we showed that the equality of the combi- 
natorial capacity and the maximum entropy rate of an input 
process holds for constrained systems in general and is not a 
consequence of regular constraints, which were considered in 
this context until now. In contrast to the proof of 1 3 Theorem 
8] in |5 | for the regular case, our proof for the general case is 
not constructive, so it remains a challenge to explicitly define 
capacity achieving input sources for constrained systems with 
non-regular constraints as the one considered in ||6l . 
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